The Perceptibility of Video Artifacts: a Perspective from Color Science
نویسندگان
چکیده
In the world of digital image and video processing, encoding and/or compression errors are often assessed in terms of the signal dimensions (e.g., RGB or YCBCR) using quantities such as RMS deviation or signal-to-noise ratio with little or no regard for the radiometric linearity of the quantities or the ultimate appearance to observers. When visual models are utilized it is usually with the aim of predicting visibility thresholds and not the perceived magnitudes of clearly visible artifacts. Color scientists quantify the visibility of color changes using color difference metrics (e.g., CIE ∆E*ab, ∆E94, and ∆E2000) computed in the CIELAB color space. Such metrics have been largely developed and optimized for specifying the magnitudes of small, supra-threshold, color differences of uniform object colors in illuminated environments. Extensions to both approaches are required to accurately model and predict the perception of differences in images and video. This paper provides an overview of error metrics used in colorimetry and their extension with models of spatial and temporal vision to imaging applications. It also examines some of the important viewing condition variables for images and their surroundings and how they are addressed with color and image appearance models. Lastly, some recent and ongoing research on the perception of image differences and quality is described, including issues in visual equivalence and image content. This is a logical extension of the VPQM 2007 presentation, “A Color Scientist Looks at Video,”[1] which stressed the importance of accuracy in encoding, processing, and display of video content to examine the perceptibility artifacts such as those introduced by compression, noise, or other errors as well as differences that are purposefully introduced into images through enhancement algorithms. 1. DIFFERENCES IN IMAGES AND VIDEO Researchers in image and video processing often find it necessary to quantify the difference between two images. When the differences are introduced by image degradations such as compression artifacts, chrominance subsampling, or errors in decoding for display, the differences can be considered a form of image quality metric to quantify the visibility and perhaps objectionability of the degradations. Sometimes, however, differences can be desirable such as those introduced by image or color enhancement algorithms and the same metrics can be used to quantify those differences. In most cases it is the perceived difference on a display, or displays, that is of most interest. However it is very rare that perceived differences on meaningful displays are actual measured and discussed. Instead differences are often expressed in terms of the data representing the image. Such data are usually not directly proportional to perceived image color and actually usually do not even represent the physical image veridically. This is because there are inherent linear and nonlinear relationships between the image data and the luminance of the (typically) three (RGB) display channels. These relationships are not channel independent as the displayed R luminance, for example, often depends on all three RGB channels of image data. Most often, these relationships are not properly characterized so there is little knowledge of the ultimate display colorimetry for given image signals. Even with this knowledge, displays and viewing conditions would have to be properly calibrated and characterized to complete the chain from image data to visual stimulus that can be used to compute meaningful perceived differences. In addition to accurately describing display appearance prior to measuring differences, there are two types of differences that might be of interest, thresholds and magnitudes. Threshold metrics simply describe whether or not an image difference is perceptible, often in terms of probability of detection or a just-noticeable-difference (JND) criterion. Magnitude metrics describe the perceived difference of images with differences clearly above threshold. Sometimes, units of multiple JNDS (e.g., two images differ by 42 JNDs) are used to erroneously describe difference magnitudes. It is well accepted in perceptual science that JNDs do not scale linearly up to perceptual magnitudes.[2,3] 2. IMAGE/VIDEO PROCESSING METRICS Image-based metrics are normally computed on the simple image data such as RGB or YCBCR and have a tenuous, at best, relationship to perceived differences. Such metrics include RMS or mean-squared error (RMSE or MSE), peak signal-to-noise ration (PSNR), and the structural similarity index (SSIM). MSE is simply the difference between each image element averaged across the entire image (one channel at a time).[4] RMSE is the square root of MSE and puts the error metric back into the same units as the image data. PSNR is the ratio of the maximum possible image data value to the MSE expressed in decibel units.[4,5] SSIM is sometimes described as being perceptually based, however examination of the formula illustrates that it has no relation to perception.[6] These metrics are not capable of describing perceived image difference mainly because they are not applied on perceptual dimensions such as lightness, chroma, and hue. Instead they are applied on just luminance signals under the assumption that all relevant image quality differences are in luminance only, in RGB or nonlinear RGB with some simple strategy for summing the quantities across the three channels if a single metric is desired, or via a similar approach in linear or nonlinear YCBCR. 3. COLORIMETRIC DIFFERENCE METRICS In the world of color science and measurement, differences are measured as magnitudes on scales designed to estimate the perceptual dimensions of lightness, chroma, and hue (and sometimes others). This approach originated with the perceptual description of color space and perceptual scales by Munsell.[7] Other researchers such as Wright and MacAdam approached the same problem from the direction of color thresholds.[8] These research paths culminated in the creation of the CIELAB and CIELUV color spaces and difference formulae in 1976.[9] Since 1976, research in color difference perception and tolerance specification has established the superiority of CIELAB over CIELUV and focussed on the creation of weighted color difference equations within the CIELAB color space. The most widely used and best performing of these are the CIE94 and CIEDE2000 color difference equations. Once images are expressed in calibrated CIELAB units of L* (lightness correlate), a* (rednessgreenness), and b* (yellowness-blueness) or L* (lightness), C*ab (chroma) and h ab (hue), then the simple CIELAB 1976 color difference is defined as the Euclidean distance between the two colors. For images, these differences are often averaged across the entire image or other statistics such as a histogram of differences, or certain percentiles are evaluated. Similar approaches can be taken with the more perceptually accurate CIE94 and CIEDE2000 formulae illustrated partially below. These weighted equations express the differences in terms of lightness, chroma, and hue and then adjust the relative weighting of the difference dimensions depending on the location in color space. Note that there is not space to fully express the derivation or computation of these difference equations in this paper. See [10] for details. It is likely that the CIEDE2000 equation is more complex than required for imaging applications, but the CIE94 equation probably represents a significant advance over a simple CIELAB 1976 difference. 4. VISUAL THRESHOLD MODELS There has been significant research on video quality and video quality metrics, often aimed at the creation and optimization of encoding/compression/decoding algorithms such as MPEG2 and MPEG4. By analogy, the still-image visible differences predictor of Daly[11] is quite applicable to the prediction of the visibility of artifacts introduced into still images by JPEG image compression. The Daly model was designed to predict the probability of detecting an artifact (i.e., is the artifact above the visual threshold). The CVDM metric[12] represented an extension of the Daly VDP to include all three dimensions of color. Other metrics have been published to examine the probability of detection of artifacts in video (i.e., threshold metrics). Two wellknown video image quality models, the Sarnoff JND model and the NASA DVQ model, are briefly described below to contrast their capabilities with models aimed at predicting image difference magnitudes and appearance. The Sarnoff JND model is the basis of the JNDmetrix software package and related video quality hardware. The model is briefly described in a technical report published by Sarnoff[13] and more fully disclosed in other publications.[14] It is based on the multiscale model of spatial vision published by Lubin[15,16] with some extensions for color processing and temporal variation. The Lubin model is similar in nature to the Daly model in that it is designed to predict the probability of detection of artifacts in images. These are threshold changes in images often referred to as just-noticeable differences, or JNDs. The Sarnoff JND model has no mechanisms of chromatic or luminance adaptation. The input to the Sarnoff model must first be normalized (which can be considered a very rudimentary form of adaptation). The temporal aspects of the Sarnoff model are also not aimed at predicting the appearance of video sequences, but rather at predicting the detectability of temporal artifacts. As such, the model only uses two frames (four fields) in its temporal processing. Thus, while it is capable of predicting the perceptibility of relatively high frequency temporal variation in the video (flicker) it cannot predict the visibility of low frequency variations that would require an appearance-oriented, rather than JND-oriented, model. While it is well-accepted in the vision science literature that JND predictions are not linearly related to suprathreshold appearance differences, it is certainly possible to use a JND model to try to predict suprathreshold image differences and the Sarnoff JND model has been applied with some success to such data. A similar model, the DVQ (Digital Video Quality) metric has been published by Watson[17] and Watson et al. [18] of NASA. The DVQ metric is similar in concept to the Sarnoff JND model, but significantly different in implementation. Its spatial decomposition is based on the coefficients of a discrete cosine transformation (DCT) making it amenable to hardware implementation and likely making it particularly good at detecting artifacts introduced by DCT-based video compression algorithms. It also has a more robust temporal filter that should be capable of predicting a wider array of temporal artifacts. Like the Sarnoff model, the DVQ metric is aimed at predicting the probability of detection of threshold image differences. The DVQ model also includes no explicit appearance processing through spatial or temporal adaptation, or correlates of appearance attributes. 5. SPATIAL COLOR DIFFERENCE MODELS Another approach to incorporating the properties of spatial vision into image difference metrics is to combine spatial filtering of the images with the computation of traditional colorimetric difference metrics. An early example of this process is the S-CIELAB model of Zhang and Wandell.[19] Johnson and Fairchild[20] extended the S-CIELAB framework to include more robust spatial filtering techniques and the CIEDE2000 color difference equation. A more in-depth and theoretical approach was derived by Johnson et al. and referred to as a modular image difference metric.[21] This metric included a comparison between two images using first a set of three twodimensional contrast sensitivity functions for the opponentcolors dimensions (light-dark, red-green, yellow-blue). This was followed by a spatial localization process that increased the importance of differences near edges in the scene, something observers do as well. The next step was local contrast detection to modulate the predicted differences based on the magnitude and direction of a pixel’s contrast with respect to its local background. The filtered images were then transformed into uniform color space (typically the IPT space) and from their a map of traditional color difference components was produced and summarized with a variety of statistics. Johnson’s modular image difference metric evolved into the full iCAM image appearance model.[21,22] The iCAM framework has been successfully applied to various image quality predictions such as changes in sharpness and contrast as well as used to render high-dynamic-range still and video images. The framework continues to be a topic of research. 6. TEMPORAL DIMENSIONS The S-CIELAB approach to combining spatial filtering and the CIELAB color difference metric has also been extended into the temporal domain for application to video difference issues. Two such approaches are ST-CIELAB and SVCIELAB. ST-CIELAB[23] utilized two-dimensional spatiotemporal contrast sensitivity functions applied to luminance and chromatic dimensions prior to CIELAB color difference computations. The computed difference were then pooled spatially and temporally to provide the STCIELAB difference rating. It should be noted that in the spatial domain the ST-CIELAB filters are one-dimensional rather than the two spatial dimensions represented in the modular image difference metric of Johnson.[21] Recently, Hirai et al. introduced the SV-CIELAB metric.[24] This novel metric uses filtering in the spatial and velocity (rather than temporal frequency) domains. The initial version of SV-CIELAB works only in the luminance dimension (i.e. grayscale videos), but the concept could be readily extended to all three color dimensions. Original and distorted video sequences are first converted to luminance, Y. Then the velocity of motion at each pixel location is computed. The image sequence is then filtered using the SV-CSF and CIELAB differences (in this case just L* differences) are computed between the filtered image sequences. Results of a psychophysical experiment illustrated that the SV-CIELAB metric performed significantly better than ST-CIELAB, S-CIELAB, CIELAB alone, PSNR in CIELAB, and SSIM.[24] Interestingly, SSIM performed reasonably well since the image set was limited to one dimension and a relatively small sampling of videos. The iCAM framework has also been applied to video rendering via temporal adaptation to illustrate the change in appearance of video sequences over time. However it has not been implemented and tested with spatio-temporal or spatio-velocity filtering; something that remains for future research. 7. FUTURE DIRECTIONS & CONCLUSION It is clear that, while much has been accomplished in the domain of video quality and difference metrics, much remains to be understood, modeled, and tested in realistic viewing situations. One thing is certain however, no significant improvement in predictions can be made without proper and accurate colorimetric calibration and characterization of video displays and use of perceptual correlates, rather than image data dimensions, as the basis for difference and quality metrics. Other interesting and important problems include adaptation to complex environments, different types of filtering techniques, combination of color difference and color appearance models, application to high-dynamic-range and wide-colorgamut display systems, dependency on image content, and a rather new concept of visual equivalence. The human visual system is very complex in how it adapts to the viewing environment. For example there are well documented adaptation phenomena for color, space, time, spatial frequency, and temporal frequency. There is also a less-well-documented, but very real, adaptation to noise in images.[25] The iCAM framework and its extensions have been applied to many of these dimensions of adaptation (mainly color and space) to predict image appearance and differences.[21,22] Future work will have to aim to incorporate the other adaptation dimensions. At the most recent IS&T;/SID Color Imaging Conference, it was quite clear that the worlds of color science and image quality could benefit from more crosspollination. In addition to the SV-CIELAB model that was introduced there,[24] there were interesting papers on the use of an adaptive bilateral filter for predicting color image difference[26] and the application of image quality metrics to color gamut mapping.[27] The bilateral filtering technique was a simple approach that effectively combined traditional spatial filtering with the local adaptation metric of Johnson et al.[21] There is also significant progress yet to be made in the area of traditional color difference equations. The most likely advances will be the combination of CIEDE94-like weighted (but not too complex) color difference equations with the CIECAM02 color appearance model.[22] Berns and Xue[28] have recently reported promising results with more certain to come in the future. Recent video displays are pushing historical boundaries in terms of color gamut volume and dynamic range.[29] It is very likely that image color difference metrics that applied to historical gamuts and dynamic ranges could fail when applied to high-dynamic-range and wide-color-gamut image displays. Perceived image quality always depends significantly on image content. A review of eye-tracking research with respect to image quality assessment suggests that observers choose just one image area to attend to for any given task and look at no other parts of the image.[30] However, different observers will choose different image areas and this might well be the source of significant inter-observer variability and image dependency in image quality and image difference experiments. Research is being planned to further probe and address this issue. One final concept that could be of great use in image and video quality assessment is that of visual equivalence. [31,32] Images can be considered visually equivalent when either the pixel data result in displays that are imperceptibly different on a pixel-by-pixel basis or when the subject matter is rendered in such a way that the objects in the scene look appropriate even when they might be physically very different from the original. For example when a scene is rendered using computer graphics, great pains can be taken to assure that every reflection and specular highlight in the image is a perfect physical match to the optics of the modeled scene. Alternatively approximations could be made that produce reflected highlights and scene elements that only approximate physical reality while appearing completely plausible and not affecting the perception of material properties. A pixel-by-pixel comparison of such images will show very large differences that observers tend to overlook unless they are specifically brought to their attention. However a model of visual equivalence (i.e., all the objects in the scene look right) would make a prediction that matches human observation. Clearly deriving such a model is just one of the many challenges in the field of image and video quality assessment.
منابع مشابه
Video games to rehabilitate and improve the cognitive skills of people with cognitive impairment: A special perspective to cognitive health in the elderly
Video games are the most modern and sophisticated form of media in the present, which attracts millions of children and adults worldwide. The various effects of different types of video games on the psychological characteristics of gamers have been studied over the years. The serious effects of these games on cognitive and emotional characteristics such as memory, concentration, visual-motor sk...
متن کاملVideo-based face recognition in color space by graph-based discriminant analysis
Video-based face recognition has attracted significant attention in many applications such as media technology, network security, human-machine interfaces, and automatic access control system in the past decade. The usual way for face recognition is based upon the grayscale image produced by combining the three color component images. In this work, we consider grayscale image as well as color s...
متن کاملThe Effect of Image Content on Color Difference Perceptibility
Considerable work has been conducted regarding the perceptibility of color differences for simple images such as uniform color patches. From this work comes such tools as the MacAdam ellipses. Less is known regarding color difference perceptibilitywhen complex images are involved. However, the proliferation ofdesktop publishing equipment and the increasingly technically elegant accompanying sof...
متن کاملThe Effect of Family Nursing Education Using Reflection Method with the Help of Situation Simulation Through Video Screening on Learning and Perspective of Nursing Students
Introduction: Reflection is one of the basic methods of education that is effective in raising the level of awareness and skills in clinical situations. The aim of this study was to investigate the effect of family nursing education using reflection method with the help of situation simulation through video screening on learning and perspective of nursing students. Methods: This quasi-experimen...
متن کاملColor difference threshold determination for acrylic denture base resins.
This study aimed to set evaluation indicators, i.e., perceptibility and acceptability color difference thresholds, of color stability for acrylic denture base resins for a spectrophotometric assessing method, which offered an alternative to the visual method described in ISO 20795-1:2013. A total of 291 disk specimens 50±1 mm in diameter and 0.5±0.1 mm thick were prepared (ISO 20795-1:2013) and...
متن کاملUse of a porcelain color discrimination test to evaluate color difference formulas.
STATEMENT OF PROBLEM Limited studies have indicated that an alternative small color difference formula would be more appropriate for use in dentistry. PURPOSE The purposes of this study were to determine which color difference formula provides a superior degree of fit for judgments of perceptibility and acceptability and to determine if different groups of evaluators have different levels of ...
متن کامل